Extracting pronunciation rules for phonemic variants
نویسندگان
چکیده
Various automated techniques can be used to generalise from phonemic lexicons through the extraction of grapheme-to-phoneme rule sets. These techniques are particularly useful when developing pronunciation models for previously unmodelled languages: a frequent requirement when developing multilingual speech processing systems. However, many of the learning algorithms (such as Dynamically Expanding Context or Default&Refine) experience difficulty in accommodating alternate pronunciations that occur in the training lexicon. In this paper we propose an approach for the incorporation of phonemic variants in a typical instancebased learning algorithm, Default&Refine. We investigate the use of a combined ‘pseudo-phoneme’ associated with a set of ‘generation restriction rules’ to model those phonemes that are consistently realised as two or more variants in the training lexicon. We evaluate the effectiveness of this approach using the Oxford Advanced Learners Dictionary, a publicly available English pronunciation lexicon. We find that phonemic variation exhibits sufficient regularity to be modelled through extracted rules, and that acceptable variants may be underrepresented in the studied lexicon. The proposed method is applicable to many approaches besides the Default&Refine algorithm, and provides a simple but effective technique for including phonemic variants in grapheme-to-phoneme rule extraction frameworks.
منابع مشابه
Automatic generation of Korean pronunciation variants by multistage applications of phonological rules
Phonetic transcriptions are often manually encoded in a pronunciation lexicon. This process is time consuming and requires linguistic expertise. Moreover, it is very difficult to maintain consistency. To handle these problems, we present a model that produces Korean pronunciation variants based on morphophonological analysis. By analyzing phonological variations frequently found in spoken Korea...
متن کاملDeveloping consistent pronunciation models for phonemic variants
Pronunciation lexicons often contain pronunciation variants. This can create two problems: It can be difficult to define these variants in an internally consistent way and it can also be difficult to extract generalised grapheme-to-phoneme rule sets from a lexicon containing variants. In this paper we address both these issues by creating ‘pseudo-phonemes’ associated with sets of ‘generation re...
متن کاملGeneration and Selection of Pronunciation Variants for a Flexible Word Recognizer
This paper presents an approach for the generation and selection of pronunciation transcriptions for a exible word recognizer. The basic idea is to produce pronunciation variants and corresponding scores with a set of pronunciation variation rules, which are weighted with their frequencies of occurence measured on the training data. This approach addresses the problem of interfering transcripti...
متن کاملStatistical Analysis of Korean Pronunciation Variations
In this paper, we present a statistical analysis of Korean pronunciation variations using a Grapheme-to-Phoneme (GTP) system. The GPT system generates pronunciation variants by applying rules modeling obligatory and optional phonemic changes and allophonic changes in spoken Korean. Experimental results using a PBS (Phonetically Balanced Sentence) Speech DB of 60,000 sentences show that the most...
متن کاملHybrid Phonemic and Graphemic Modeling for Arabic Speech Recognition
In this research, we propose a hybrid approach for acoustic and pronunciation modeling for Arabic speech recognition. The hybrid approach benefits from both vocalized and non-vocalized Arabic resources, based on the fact that the amount of non-vocalized resources is always higher than vocalized resources. Two speech recognition baseline systems were built: phonemic and graphemic. The two baseli...
متن کامل